Minimum chi-square method for estimating population size in capture-recapture experiments

Closed population capture-recapture estimation of population size is difficult under heterogeneous capture probabilities. We introduce the minimum chi-square method which can handle multi-occasion capture-recapture data. It complements likelihood methods with elements that can lead to confidence intervals and assessment of goodness-of-fit. We conduct a comprehensive study on the minimum chi-square method for estimating the size of a closed population using multiple-occasion capture-recapture data under heterogeneous capture probability. We also develop two different bootstrap techniques that can be combined with any underlying estimator, be it the minimum chi-square estimator or a likelihood estimator, to perform useful inference for estimating population size. We present a simulation study on the minimum chi-square method and apply it to analyze white stork multiple capture-recapture data. Under certain conditions, the chi-square method outperforms the likelihood based methods.


Introduction
Capture-recapture methods are commonly used in wildlife studies to estimate population size by using information from marked individuals.The Lincoln-Petersen method [1] is the simplest method for estimating a population size with capture-recapture data.Many authors have proposed extensions to model capture-recapture data from a closed population (where the population is closed to births/deaths and immigration/emigration) [2], including developments to incorporate capture probabilities that vary over individual (heterogeneous capture probabilities).For example, [3] suggested using the full and conditional likelihood methods to estimate population size; [4] recommended using a jackknife estimator; [5] proposed a conditional likelihood estimator with covariates; [6] suggested likelihood estimation with finite mixture models; and [7] investigated a Conway-Maxwell-Poisson estimator as it allows different levels of heterogeneity adaptively.
Link [8] described the nonidentifiability of a population size N as the case where populations of very different sizes give rise to roughly the same observed capture history, which implies when we have only the capture history as we normally do in a capture-recapture experiment, the underlying population size cannot be accurately determined.Link (2003) brought forth the nonidentifiability problem for population size N from the standpoint that N could not be identified based on the conditional distribution of the observed data.He tried maximizing the full likelihood function and maximizing the conditional likelihood function to estimate population size.In essence, the nonidentifiability problem refers to the over-dependence; in the sense that for a given set of data, the estimated value N can be vastly different when the underlying capture probability distribution g(p; θ) changes.In recent wildlife studies, researchers have focused on the distribution of capture probability p to model the heterogeneities and estimate the abundance under heterogeneity.The authors of [9] utilized a logit-normal model for p and [10] recommended using a beta model for p.The problem of nonidentifiability under heterogeneity still remains when setting high detection probabilities, restricting the number of parameters in models, or obtaining a large number of sampling occasions [11].
We explore the minimum chi-square method and compare it to two existing likelihood methods for estimating population size using multi-occasion capture-recapture data, and we apply these to estimate the white stork (Ciconia ciconia) population size in their main European wintering area, southwestern Spain.A white stork data set was obtained from [12], who focused on analyzing individual consistency in the use of food subsidies.Individuals within the study population were detected daily throughout the wintering season lasting 80 days with the additional state information referring to the foraging categories (dumps or rice fields) in which the individuals were detected.The authors of [13] commented that there were about 4000 white storks wintering in the study area when the capture history data were recorded.This number is based on census data collected and synthesized by scientists from the Spanish Ornithological Society in collaboration with Birdlife International.We give complementary estimates based on the three statistical methods of estimation.The existing likelihood methods that we use are the full and conditional likelihood methods.The minimum chi-square method is a recognized alternative to likelihood based methods [14].It has been applied to capture-recapture data on a small humpback whale study assuming constant capture probability [15] and was investigated by [16] in the context of time varying capture probabilities.Authors of [17] mentions the possibility of using the minimum chi-square method in a three sample closed population study, but never applied it to their study.We are interested in this method because it lends itself naturally to the multi-occasion capture-recapture data which are categorical data that can be directly handled by this method without discretization.Further, a major advantage to the minimum chi-square method is that it provides point estimation, interval estimation and goodness-of-fit assessment at the same time.We adopt this method for multi-occasion capture-recapture data and show through the white stork data analysis that it is a competitive alternative to existing methods.
The rest of this paper is organized as follows.In Section 2, we first give a brief review of the full and conditional likelihood methods for multi-occasion capture-recapture data, and then discuss and adopt the minimum chi-square method for such data.We also propose two bootstrap techniques for multi-occasion capture-recapture data which can be used to construct confidence intervals for the population size.In Section 3, we present a simulation study to examine the finite sample properties of the minimum chi-square statistic which depend on the underlying population size and sample size.Due to our interest in estimating the white stork population size, we will focus on the case where the population size is not too large.We then apply the likelihood and the minimum chi-square methods to estimate the white stork population size in Section 4 and conclude with some remarks in Section 5. We also studied the accuracy and robustness of the minimum chi-square method when the underlying population is much larger than the white stork population.Results concerning such large population cases are given in the S1 File.

The full and conditional likelihood methods
We review the full and conditional likelihood methods by following the discussion about these in [11].Let N be the unknown population size, X 1 , X 2 , . .., X N be independent random variables where X i is the number of times individual i is observed, and T be the number of sampling occasions.Define a vector of frequencies f = (f 0 , f 1 , . .., f T ) 0 where f x = #(X i = x) for x = 0, 1, 2, . .., T and let p 1 , p 2 , . .., p N be the capture probabilities of the N individuals.The distribution of X i given p i follows a binomial distribution with a total number of trials T and a probability of success p i , so we write [X i |p i ]* Binomial(T, p i ).We assume p 1 , p 2 , . .., p N are independent and identically distributed random variables with distribution function g(p; θ) with unknown parameter vector θ.Here, g(p; θ) is a probability density function that has support [0, 1] when p is continuous.We use the following capture probability models for heterogeneity: (a) beta distribution: g(p; θ) = g(p; α, β) where α and β are parameters and (b) logitnormal distribution: g(p; θ) = g(p; μ, σ) with parameters μ and σ.
Missing from capture-recapture data is f 0 , the number of individuals who are never captured during the T sampling occasions.Since f x is the number of individuals captured x times, we only know f x for x = 1, 2, . .., T, and f 0 is unknown.With this notation, the number of individuals captured at least once can be expressed as n ¼ P T x¼1 f x .If the probability density function g(p; θ) of p is known, then the probability that a randomly selected individual will be captured x times can be expressed as It is clear that f 0 = N − n and that f = (f 0 , f 1 , f 2 , . .., f T ) 0 follows a multinomial distribution with (T + 1) cells and the corresponding cell probabilities are π g = (π g (0), π g (1), . .., π g (T)) 0 , where π g (x) is the probability of an individual sighted x times.More precisely, with a total population size of N and T sampling times, the distribution of the random vector of observed frequencies f is as follows [f]* Multinomial T+1 (N, π g ).However, the multinomial distribution of f cannot be determined since f 0 is unknown.Instead, we consider the distribution of the observable frequencies by conditioning on the total number of individuals captured, n.We denote the observable frequencies by f c = (f 1 , f 2 , . .., f T ) 0 , and write π c g ¼ ðp c g ð1Þ; p c g ð2Þ; . . .; p c g ðTÞÞ 0 where p c g ðxÞ denotes the probability that an individual is sighted x times conditional on the individual sighted at least once.The conditional cell probabilities π c g are calculated based on the unconditional probabilities π g as follows p c g ðxÞ ¼ p g ðxÞ=f1 À p g ð0Þg; x ¼ 1; 2; . . .; T: The conditional distribution of f c given n is a multinomial distribution with a total of n trials and T cell probabilities π c g , that is, [f c |n]* Multinomial T ðn; p c g Þ.The number of individuals n that are captured at least once follows a binomial distribution with a total of N trials and a probability of success 1 − π g (0).Thus, [n]* Binomial(N, 1 − π g (0)).Finally, it follows from the above that the distribution of f can be expressed as The full likelihood of N and the unknown parameters can be written as , which can be used to estimate N and θ simultaneously.If we condition on n, then N can be removed from the full likelihood and the resulting conditional likelihood for θ can be expressed as . .p c g ðTÞ f T : Here, elements of p c g ðxÞ are computed based on π g (x) and π g (0) where π g (x) is dependent on the underlying capture probability distribution.
To use the conditional likelihood to estimate the population size N, we first find the maximizer θ of the conditional likelihood.With this estimated value of θ, the distribution function gðp; θÞ is available and the probability π g (0) can then be estimated.It follows that the population size N can be estimated using a Horvitz-Thompson type estimator [11] as N ¼ n=f1 À pg ð0Þg: This conditional likelihood method and the full likelihood method are asymptotically equivalent [3].Therefore, they often give similar results when the number of individuals captured at least once (n) is large.
We note that all simulations in Section 3 and analyses of the white stork data in Section 4 were done using R software [18].

The minimum chi-square method
Minimum chi-square estimation has a long history in statistics; see, e.g., [19,20].We now discuss this method and extend it to handle population size estimation with multi-occasion closed-population capture-recapture data.
Suppose Y is a multinomial random variable supported on {y 1 , y 2 , . .., y K } with probability mass function π(y i ; θ) where y i 's are fixed constants, K � 2 is a known integer, π(y i ; θ)>0 and ∑π(y i , θ) = 1, and θ is a d-dimensional parameter vector.Let θ t 2 R d denote the true but unknown value of the parameter vector.For a given value θ 2 R d , consider testing the null hypothesis H 0 : θ t = θ with a random sample of n observations of Y. Let O i denote the number of times y i appears in the sample.Then, O i is the observed value and E i (θ) = nπ(y i ; θ) is the expected value under H 0 .The Pearson chi-square statistic for the null hypothesis is given by Under H 0 , X 2 (θ) has an asymptotic chi-square distribution with (K − 1) degrees of freedom, so to test the null hypothesis we simply compare the observed value of the X 2 (θ) statistic with a chosen χ 2 critical value.Those values of θ not rejected by the test form a region which [20] called the consonance region for θ t .Denote by R 1À a the 100(1-α)% consonance region for θ t .Then, where w 2 a;KÀ 1 is the (1 − α)th quantile of the w 2 KÀ 1 random variable.This concept was first introduced by [21] who were more interested in how consonant the data are with the probability model π(y i ; θ) than estimating the unknown parameter vector θ.We will also use the name consonance region to differentiate this region from other types of confidence regions.However, since our main interest is in parameter estimation, we will continue to call the quantity 100(1 − α)% associated with the consonance region its confidence level.
We now compare the consonance region (2) with the classical confidence region based on the asymptotic distribution of a maximum likelihood estimator for θ t .The following are three key comparisons.
(i) The construction of the consonance region (2) does not require a point estimator.It is obtained by inverting the Pearson chi-square test in that it consists of θ values not rejected by the test at level-α.Apart from the Pearson chi-square test, other goodness-of-fit tests may also be used to derive consonance regions.For example, in [20], the Anderson-Darling test was also used to derive consonance regions.For discrete data such as the capture-recapture data, the chi-square test is a preferred natural choice.
(ii) The classical confidence region can be constructed for any confidence level (1 − α) 2 (0, 1).For the consonance region, there is a lower bound on the confidence level; consonance regions with levels below this bound are empty.To see this, let θ ¼ argmin θ fX 2 ðθÞg; and let α* = P ðw 2 KÀ 1 � X 2 ð θÞÞ.Then, X 2 ð θÞ is the (1 − α*)th quantile of the w 2 KÀ 1 random variable.A consonance region with a confidence level (1 − α) < (1 − α*) is empty because there are no θ values satisfying X 2 ðθÞ � w 2 a;KÀ 1 as the smallest X 2 (θ) value is X 2 ð θÞ ¼ w 2 a * ;KÀ 1 , which is greater than w 2 a;KÀ 1 because α > α*.This lower bound, as [20] pointed out, is not a cause for alarm and should be viewed as useful information.For example, suppose (1 − α*) = 0.91 so that a 90% consonance region is unavailable.This tells us that no model can pass the goodness-fit-test at the 10% level and this information is worth knowing.We note that a consonance region with a confidence level that happens to be below the bound derived from a particular sample is still valid; if the assumptions are all correct it may not be empty for the next sample and it will capture the true value 100(1 − α)% of the time when it is constructed repeatedly using independent random samples.When it is empty, it informs us that it is one of the 100α% times where the consonance region does not capture the true value.This is an advantage as none of the classical confidence regions that do not contain the true value would identify themselves as such.
(iii) Classical confidence regions based on the asymptotic distribution of a point estimator are nested in that if (1 − α 1 ) < (1 − α 2 ), then a region with confidence level (1 − α 1 ) is fully contained by one with confidence level (1 − α 2 ).The point estimate is the only point in the intersection of all confidence regions with confidence level (1 − α)>0.Consonance regions are also nested.
To extend the method of [20] to estimate N t , the true value of N, we assume that g(p; θ) is correctly specified although the true value of its parameter θ t is unknown.Let π(i; θ) be the marginal probability that a randomly selected individual from the population will be sighted exactly i times.Let G i = {j: X j = i} for i = 0, 1, . .., T.Then, (a) G i represents the group of individuals that have been observed on exactly i of the T occasions and (b) the G i form a partition of the population.Let O i denote the number of individuals in G i .Then, (b) implies O 0 + O 1 + . . .+ O T = N t .In a capture-recapture experiment, we do not know the number of individuals in G 0 and thus O 0 is not available.But if N t is given, we can compute O 0 by using the above equation O 0 ¼ N t À P T i¼1 O i : Now consider testing the hypothesis H 0 : N t = N versus H 1 : N t 6 ¼ N using the observed O 1 , O 2 , . .., O T .For the time being, assume all O i � 5 and T > d, where d is the number of parameters for the underlying capture probability distribution g(p; θ).Possible violations of these assumptions will be addressed in the simulation study section.Define where which has an asymptotic chi-square distribution with (T + 1) − 1 − d = T − d degrees of freedom under H 0 .That the chi-square statistic (3) defined by θN has degrees of freedom T − d, instead of (T + 1) − 1, was first shown in [22].See [23] for an alternative estimator to θN that also leads to T − d degrees of freedom for the X 2 statistic.It follows from [20] that a 100 This also implies that N is the only point in the intersection of all non-empty consonance sets for N t .By point (iii) in the comparison of consonance regions and classical confidence regions in the previous section, we see that N corresponds to the point estimator for N t .Hence, we use N as a point estimator for N t and call it the minimum chi-square estimator of the population size.The theoretical properties of N are difficult to obtain and its variance is presently not available.Fortunately, the associated consonance set R N 1À a is available which reduces the need for its standard error.To construct a consonance region for the unknown parameter vector θ t , consider testing the hypothesis H 0 : θ t = θ and H 1 : θ t 6 ¼ θ using the chi-square statistic where E i (θ) = Nπ(i; θ), N t is the true population size and O 0 ¼ N t À P T i¼1 O i .Under H 0 , X 2 (N t , θ) has a chi-square distribution with T degrees of freedom.It follows from [20] that a 100(1 − α)% consonance region for θ t is fθ 2 R d : X 2 ðN t ; θÞ � w 2 a;T g ð5Þ where w 2 a;T is the (1 − α) quantile of the w 2 T random variable.Since N t is unknown, we replace it with N θ ¼ argmin N fX 2 ðN; θÞg and replace (5) with 1À a is a conservative 100(1 − α)% consonance region for θ t as its coverage level is more than 100(1 − α)%.Simulation results show that its coverage level is close to 100(1 − α)% in many applications.Following the same argument used to show that N is in all non-empty consonance sets R θ 1À a , we can show that θ in ð N ; θÞ ¼ argmin ðN;θÞ fX 2 ðN; θÞg is in all non-empty R θ 1À a .This implies θ is the "center" of each R θ 1À a and justifies its use as a point estimator for θ t .We call θ the minimum chi-square estimator for θ t .
Finally, we use the chi-square statistic X 2 ð N ; θÞ as a measure of goodness-of-fit for the heterogeneity model g(p; θ).It measures how consistent the particular combination of N and gðp; θÞ is with the observed data O 1 , O 2 , . .., O T .Since N ¼ N represents the most favourable N value under which to evaluate the goodness-of-fit of g(p; θ) (in the sense that X 2 ð N ; θÞ is the smallest possible value), the use of this statistic as a goodness-of-fit measure for g(p; θ) is favourable to the model.To test the null hypothesis that the true heterogeneity model is gðp; θÞ with X 2 ð N ; θÞ, a reasonable calibration of its limiting distribution is w 2 TÀ d since under the null hypothesis X 2 ðN t ; θÞ � w 2 TÀ d and N estimates N t .If we use this limiting distribution and thus the p-value p = P ðw 2 TÀ d � X 2 ð N ; θÞÞ for testing, then the type-I error may be slightly lower than the significance level α.This is because X 2 ð N ; θÞ � X 2 ðN t ; θÞ, and thus the p-value given above is larger than it would be if N t had been known which leads to fewer rejections under H 0 .

Bootstrap techniques
The empirical bootstrap is a commonly used statistical resampling technique, developed by [24], to estimate the variation of point estimates without making strong distributional assumptions.The key idea of bootstrapping is to make inference about a population based on the sample data, which can be modelled by resampling the sample data and performing inference about a sample from resampled data.It is now widely recognized as an adequate variance estimation method for capture-recapture studies [25].In this section, we will discuss two different bootstrap methods, one is resampling from the same estimated distribution, which we refer to as the parametric bootstrap, and another is resampling of individuals from the whole population, which we refer to as the bootstrap of individuals.Bootstrap of individuals is similar to one of the methods discussed in [25]; however, they were concerned with constant capture probability.These two methods are analogous to the third method and the first method, respectively, in [26].Rather than providing an estimate of variance, we focus on constructing a confidence interval for estimated population size.A parametric bootstrap approach was adopted by [27] based on maximum likelihood estimation to construct an interval estimate of the population size.We study the performance of the two bootstrap methods for constructing confidence intervals of the estimated population size N under minimum chi-square estimation through a simulation study.A comparison of confidence intervals based on the proposed bootstrap techniques for the white stork population size is presented in the application section.The accuracy of the consonance region such as (4) depends on the accuracy of the χ 2 approximation to the final sample distribution of X 2 ðN; θN Þ which may be poor when the sample size is small.The bootstrap methodology provides an alternative means of interval estimation in such situations.
Parametric bootstrap.To obtain a parametric bootstrapped confidence interval with multi-occasion capture-recapture data, we first make an assumption about the underlying capture probability model, say we assume it is a beta distribution, without specifying its parameter vector θ.We then estimate the population size N and θ using the data through either the minimum chi-square method or likelihood based methods.Let ð N ; θÞ be the estimated parameter values.We now have an (estimated) population of size N with a capture probability distribution fully specified by θ.This is our resampling distribution from which we generate a large number of bootstrap samples, say m samples.Here, each bootstrap sample is a multi-occasion capture-recapture data set with the same number of occasions T as the original data.Each sample is obtained by first generating a random sample of size N (integer) from the capture probability distribution, say fp 1 ; p 2 ; . . .; p N g and then generating N binomial random numbers B i * Binomial(T, p i ) for i ¼ 1; 2; . . .; N .The p i represents the capture probability of the i th individual in the population and B i represents the number of times it is observed during the T occasions.From these m samples we obtain m bootstrap estimates of N, say fN ? 1 ; N ? 2 ; . . .; N ?m g.Finally, we obtain a 95% parametric bootstrapped confidence interval for the population size N which is defined by the 2.5 th and 97.5 th percentiles of these estimates.

Bootstrap of individuals.
An alternative to the above parametric bootstrap is to bootstrap the individuals (see [7]) as follows.With a set of multi-occasion capture-recapture data f c = (f 1 , f 2 , . .., f T ) 0 , we again first estimate the population size under an assumption about the capture probability distribution.Let N be the estimated integer value.We now have an (estimated) population of size N consisting of individuals in the data set and those that were never observed.Let n denote the total number of individuals that have been observed at least once.
Then, the number of individuals never captured may be estimated by f 0 ¼ N À n.The estimated population can now be characterized by f ¼ ð f 0 ; f 1 ; . . .; f T Þ 0 , that is, the population contains f 0 0's, f 1 1's, and so on.Next, we take a random sample of size N from this population with replacement.Since each individual is replaced before the next individual is drawn, some individuals may appear more than once, and others may never appear at all.Each sample is a bootstrap replicate of the original capture-recapture data.We repeat this process m times, obtaining m sets of such data with which we compute m bootstrap estimates of N, say fN ? 1 ; N ? 2 ; . . .; N ?m g.We then use the 2.5 th and 97.5 th percentiles of these N ?i to obtain the 95% individual bootstrapped confidence interval for N. Compared with the parametric bootstrap, this bootstrap of individuals is less dependent on the capture probability model assumption as it does not use this assumption in the resampling step of the operation.
Numerical integration was used to obtain capture probability estimates π g (x).Further, to obtain consonance regions for model parameters θ and N we performed a grid search.More details are provided in the S1 File and examples of code are provided on Github at https:// github.com/ILR819/white_stork.

Accuracy of the chi-square approximation
In real applications, the accuracy of the chi-square approximation to the finite sample distributions of chi-square statistics in (1) and (3) depends on the size of n and N, respectively.For (1), it is well known that when n is sufficiently large so that the observed cell counts O i are all more than 5, the approximation is good.For (3), the accuracy of the approximation depends on not only N but also the number of parameters estimated d and the number of categories or cells T + 1 in the chi-square statistic, so it is important to examine the accuracy empirically for the combination of (N, d, T) that we are interested in before we use the asymptotic chi-square distribution to construct consonance intervals for the unknown N.
We now present a simulation study on the accuracy of the chi-square approximation for (3) for the case where N = 4000 and T = 10.These N and T values were chosen because of the prior information about N concerning the white stork population and the data available.For the capture probability distribution, we chose beta(1, 10) as simulation results show that it produces similar observed frequency vector f c to the white stork data.With the chosen N, T and beta(1,10), we first generate multi-occasion capture-recapture data as described in the parametric bootstrap, and then pretend the parameter values of the beta distribution are unknown and estimate them using beta(α, β) as the unknown capture probabililty distribution with the minimum chi-square estimator θN .The resulting X 2 ðN; θN Þ value is a random observation of the partial minimum chi-square statistic (3).We repeated this process 200 times, obtaining 200 sets of simulated multi-occasion capture-recapture data and 200 random observations of the X 2 ðN; θN Þ statistic.These 200 observations form a random sample of size 200 from the null distribution of the partial minimum chi-square statistic X 2 ðN; θN Þ.We then used this sample of size 200 to determine the accuracy of the chi-square approximation through QQ-plots which plot the sample quantiles against that of the corresponding quantiles of the asymptotic chi-square distribution.Recall that the degrees of freedom of the partial minimum chi-square statistic X 2 ðN; θN Þ in (3) equals the total number of cells C = T + 1 minus (d + 1) where d is the dimension of θ.Here, in order to have a positive degrees of freedom and to ensure cell counts O i are not too small, we pooled the data (aggregating counts for some sampling occasions) into a total of C = 4, 5, 6, 7 cells; see the white stork data analysis in the next section for examples of such aggregation.With d = 2 for the beta model, the degrees of freedom of the asymptotic chi-square distribution are df = 1, 2, 3, 4, respectively.
Fig 1 shows the QQ-plots of the sample of 200 for the four cases.We see from the plots that for the present combination of N, T and capture probability distribution, the asymptotic chisquare approximation to the finite sample distribution of the partial minimum chi-square statistic X 2 ðN; θN Þ is not accurate.In particular, for C = 4, the statistic X 2 ðN; θN Þ has very small values, indicating the chi-square approximation is very poor.With more cells the approximation becomes better but still not accurate enough for constructing consonance intervals for N. Nevertheless, we note that the simulated quantiles of X 2 ðN; θN Þ tend to be smaller than the corresponding chi-square quantiles when the underlying capture probability distribution is correctly specified, regardless the total number of cells C. Thus this statistic is still useful for testing whether or not the capture probability distribution used is correct; if an observed value of the statistic exceeds say the 0.95th quantile of the asymptotic chi-square distribution, then the capture probability model should be rejected at 5% level based on this empirical finding.This test is conservative in that its type-I error is less than 5% but it may not be very powerful against misspecified capture probability models.Nevertheless, in the absence of other tests for the capture probability model, we will apply this test when analyzing the white stork data.
Finally, the poor accuracy of the chi-square approximation for the present combination of N, T and capture probability model does not invalidate the minimum chi-square method as a method of point estimation.It simply indicates the asymptotic chi-square distribution cannot be used to calibrate the consonance interval for this particular combination.Further, there are many situations where the chi-square approximation has been found to be accurate when the population size N is large.See the S1 File for examples where the chi-square approximation is accurate.

Bootstrap techniques for interval estimation
Since the chi-square approximation cannot be used to compute confidence intervals for the above combination of N, T and capture probability model, we use the bootstrap instead.To see that the bootstrap is effective, we first generated 100 sets of multi-occasion capture-recapture data and used these to obtain 100 pairs of estimated parameters values fð N 1 ; θ1 Þ, ð N 2 ; θ2 Þ, . .., ð N 100 ; θ100 Þg.Each pair of estimated parameters ð N i ; θi Þ defines an estimated population from which we resample, either through parametric bootstrap or bootstrap of individuals, to generate 1000 sets of capture history data with which we produce 1000 estimates of the population size N ?i;j for j = 1, 2, . .., 1000.We then constructed a 95% confidence interval using percentiles of N ?i;j as described in Section 2.3, resulting in 100 different bootstrapped intervals f½N ?1;L ; N ?1;U �; ½N ?2;L ; N ?2;U �; . .., ½N ?100;L ; N ?100;U �g.To see these are reasonable bootstrap intervals, we plotted the point estimate N i against the "centre-point" of the interval ½N ?i;L ; N ?i;U � represented by the median of fN ?i;1 ; N ?i;2 ; . . .; N ?i;1000 } which we denote with N ?i;M .Fig 2 shows such plots of ð N i ; N ?i;M Þ when the number of cells equals to 4, 5, 6, or 7, respectively.The lines in each plot is the y = x line.We see that regardless of the number of cells, the ð N i ; N ?i;M Þ pairs are located around the y = x line indicating that the point estimate N i is in the centre of the bootstrap confidence interval.Also, among the 100 confidence intervals for each C value, the percentage of the intervals containing the true population size N = 4000 is around 95% when the capture probability distribution is correctly specified, very close to the confidence level of 95%.We did the above simulation and plots for other combinations of N, T and capture probability distribution and obtained similar results.
Fig 3 shows the histograms of randomly selected sets of bootstrap estimates fN ?i;1 ; N ?i;2 ; . . .; N ?i;1000 } for the 4-cells, 5-cells, 6-cells, and 7-cells cases, respectively.It is clear that the shape of the histogram for each case is non-normal.The test statistic and the corresponding p-value of Shapiro-Wilk normality test underneath each histogram further support this observation.This is consistent with the skewness of the distribution of population size estimators for all proposed methods in the capture-recapture literature; see [28].Because of this, normal based confidence intervals cannot be used.When the asymptotic chi-square approximation is accurate, we can use the consonance set (4) as our confidence interval.When it is not, such as in our present case, we need to use the above bootstrap based confidence intervals instead.

Bias in population size estimation
Through a thorough simulation study that varied population size N, E(p), and the generating capture probability distribution, we investigated robustness in terms of bias and root meansquare error of N (see Section 2.3 of the S1 File).In most cases, the minimum chi-square method outperforms the likelihood methods.In particular, when population size N is large and expected capture-probability E(p) is high, the minimum chi-square method outperforms the likelihood methods regardless of the number of cells used.
Application to the white stork data [12] conducted a study on the foraging strategies of white storks (Ciconia ciconia) as a closed population in southwestern Spain.They investigated individual foraging specialization.The data were collected by two observers within the white stork's main wintering area in southwestern Spain.At each sampling occasion, they recorded a "1" if marked storks were observed at rice fields, a "2" if marked storks were observed at dumps, and a "0" if marked storks were not detected in a particular occasion.In total, 1684 different individuals were banded on 80 sampling occasions during the study period lasting 80 days.[12] found apparent survival rate was close to 1 during the winter study concluding that a closure assumption was valid.Due to the large study area and the large number of sampling occasions, the data are sparse.So we aggregated the daily white storks capture history data to reduce the number of sampling occasions from T = 80 to T = 10.This was done by pooling, for example, the first 8 days' observations together to form the observation for the first (aggregated) sampling occasion; a white stork is recorded as "captured" on this sampling occasion if it was captured at least once in the first 8 days, and recorded as "not captured" otherwise.After the aggregation, the observed frequency vector f c = {1021, 420, 166, 50, 20, 6, 1, 0, 0, 0}.To avoid zero count cells and small cell The 4-cell case, for example, includes a cell for unobserved individuals whose count is unknown.
An assumption often made in the modelling of capture-recapture data is that of homogeneity in the capture probability.The homogeneity model postulates that the capture probability p does not vary from one individual to another and over sampling occasions.Under this model, g(p; θ) is a point mass of 1 at p, and π g (x) is p g ðxÞ ¼ T x À � p x ð1 À pÞ TÀ x : Under this assumption, we computed the estimated N by using the minimum chi-square method, the full likelihood method, and the conditional likelihood method (Table 1).The full likelihood method gives an estimated population size of 2434 white storks and the capture probability p is estimated to be 0.11.The conditional likelihood method gives an estimate of 2433 with an estimated capture probability of 0.11 as well.For the minimum chi-square method, estimate of the population size gradually decreases from 2500 to 2344 as the number of cells increases which are largely in agreement with that given by the full and conditional likelihood methods.Moreover, the minimum chi-square statistics as a measure of goodness-of-fit is large relative to the corresponding chi-square distribution for all four cases.Consequently, the p-values are highly significant which suggests that heterogeneity in capture probability may exist and the point estimate obtained based on homogeneity assumption is not reliable.Note that this information is not available if we use the likelihood methods alone.
To take the heterogeneity in capture probabilities into consideration, we first consider modelling the heterogeneity using the beta distribution as it can take on many different shapes.The probability density function of the beta distribution is g(p; θ) = g(p; α, β) = Γ(α + β)/{Γ(α) Γ(β)}p α−1 (1 − p) β−1 where α > 0 and β > 0 are two shape parameters.Table 2 shows the results under the beta model for capture probability.The minimum chi-square estimate of the population size does not change very much with the C value indicating it is robust against the number of cells used.The small minimum chi-square statistics and the corresponding large pvalues (close to 1) indicate that the beta model is appropriate for the capture probability.The minimum chi-square confidence intervals are also robust against the number of cells used.Both the minimum chi-square point estimate of N and the 95% CI are consistent with the full Table 1.Maximum full likelihood, maximum conditional likelihood and minimum chi-square estimates under the homogeneity assumption for capture probability.C is the number of cells after pooling counts in the right tail of f c , N is the estimated value of N and p is the estimated capture probability p.The chi-square statistic X 2 , its degree of freedom df and p-value summarize the minimum chi-square goodness-of-fit test for the homogeneity model for capture probabilities.and conditional likelihood methods.To summarize, the minimum chi-square statistic shows that the heterogeneity in capture probability is present and the beta distribution is an appropriate model for the heterogeneity.It also provides a credible estimate of the white stork population size that is similar to the likelihood estimates and agrees with the census findings of 4000 [2].We now use the logit-normal distribution to model the heterogeneity of the capture probability.Its density function is gðp; θÞ ¼ gðp; m; sÞ ¼ 1=ðs ffi ffi ffi ffi ffiffi 2p p Þ � 1=fpð1 À pÞgexpð½À logfp=ð1 À pÞg À m�=ð2s 2 ÞÞ; where μ is the mean and σ > 0 is the standard deviation.We estimated the white stork population size using the minimum chi-square method, and the full and conditional likelihood methods with this model (Table 3).The point estimates of the model parameters as well as the population size are quite stable for all cases.The relatively large p-values of the minimum chi-square method indicates the logit-normal model is also acceptable for the white stork data.For the minimum chi-square method, the confidence intervals become stable as the number of cells increases for both bootstrap techniques.When the number of cells C = 7, the two bootstrap intervals are only slightly wider than the ones given by the likelihood methods.The point estimates under the logit-normal case range from about 3350 to 3500, which are 500 to 600 individuals fewer than that under the beta model or Table 2. Maximum full likelihood, maximum conditional likelihood and minimum chi-square estimates under the beta model for capture probability.C is the number of cells after pooling counts in the right tail of f c and N is the minimum chi-square point estimator of N. The chi-square statistic X 2 , its degree of freedom df and pvalue summarize the minimum chi-square goodness-of-fit test for the beta model.Interval estimates and their widths using parametric bootstrap and bootstrap of individuals are also shown.C is the number of cells after pooling the right tail of f c and N is the point estimator of N. The chi-square statistic X 2 , its degree of freedom df and p-value summarize the minimum chi-square goodness-of-fit test for the logit-normal model.Interval estimates and their widths using parametric bootstrap and bootstrap of individuals are also shown.the census findings.Most of the confidence intervals do not contain the number 4000, but they are narrower than the confidence intervals under the beta model.

Conclusion
We explored the minimum chi-square method for estimating a closed population size with multi-occasion capture-recapture data as an alternative method to likelihood based methods.The advantage of the minimum chi-square method is that it not only estimates the unknown population size but also performs goodness-of-fit test on the capture probability model.Further, the partial minimum chi-square statistic has an asymptotic chi-square distribution which can be easily used to construct consonance/confidence interval when the chi-square approximation is accurate.It is important to examine the accuracy of the chi-square approximation through simulations.When the accuracy is unsatisfactory, which may happen when the population is small, we use the bootstrap methods to construct confidence intervals.
In the white stork application, the estimated population size dropped by more than 10% for all methods when we changed the capture probability model from beta to logit-normal and yet both beta and logit-normal models seem to fit the data well.This raises the question of which estimate and which model we should trust.This is related to the nonidentifiability problem we noted in the introduction.While more work is needed to address this problem, through extensive simulation [29] showed that the minimum chi-square estimator N is robust against misspecification of the capture probability model g(p; θ) for the case of a large population size of N = 10000 and an expected capture probability E(p) that is not too small; that is, we obtain good estimates of N even when the capture probability model is misspecified.In this case, the minimum chi-square method outperforms the existing likelihood based estimators in terms of bias and mean square error.The asymptotic chi-square distribution is also accurate and the associated consonance set R N 1À a also performs well in terms of coverage accuracy.See the S1 File for some of the simulation results from [29].Nevertheless, closed populations tend to be small (e.g. a confined population of 135 cottontail rabbit in [30]; an estimated closed population size of 173 of meadow vole in [2]).Thus the bootstrap approach to inference is necessary for these cases.But for population sizes at or above 10000, the consonance interval of the minimum chi-square method is a more natural interval estimate for the population size.
Our beta model based analysis of the white stork data provided supporting evidence to the existing estimate of N = 4000 obtained by the scientists through other means.Because the logit-normal model based estimates differ substantially from this number, we conclude that the capture probability of the white storks likely follows a beta distribution.
Finally, we note that the model we considered assumes only individual heterogeneity among capture probabilities.We did not consider the additional complexity of time-dependent individual heterogeneity in capture probability which would require restructuring the model.We leave this as an interesting future research direction.

Fig 1 .
Fig 1. QQ-plots of a sample of n = 200 partial minimum chi-square statistic values versus quantiles of the asymptotic chisquare distribution for C = 4, C = 5, C = 6, and C = 7, respectively, where N = 4000, T = 10, and the capture probability distribution is beta.The red line in each plot is the y = x line.https://doi.org/10.1371/journal.pone.0292622.g001

Fig 2 .
Fig 2. The geometry of the bootstrap confidence intervals: Plots of the 100 point estimates N i versus the "centre-point" of the confidence interval N ?i;M for C = 4, C = 5, C = 6, and C = 7 where N = 4000, T = 10 and the capture probability distribution is beta.The red line is the y = x line.https://doi.org/10.1371/journal.pone.0292622.g002

Fig 3 .
Fig 3. Histograms of samples of size 1000 bootstrap minimum chi-square estimates for the population size for C = 4, C = 5, C = 6, and C = 7, respectively, where N = 4000, T = 10 and the capture probability model is beta.Below the plots are the test statistics and the corresponding p-values of Shapiro-Wilk test for the null hypothesis that the bootstrap estimates are normally distributed.https://doi.org/10.1371/journal.pone.0292622.g003